natural_language_understandingfandomcom-20200214-history
Word embedding
TODO: an interesting paper with important references: https://arxiv.org/pdf/1702.01417.pdf Word embedding is an assignment of a vector to each word in a language: W: words \rightarrow \mathbb{R}^n . Typically, the assignment is learned from a large corpus, a vector is dense and has a relatively small dimensionality (for example, 200 to 500) compared to distributional semantics models. Good practices to train word embeddings: see Lai et al. (2016)Lai, S., Liu, K., He, S., & Zhao, J. (2016). How to generate a good word embedding. IEEE Intelligent Systems. https://doi.org/10.1109/MIS.2016.45: # "First, we discover that corpus domain is more important than corpus size. We recommend choosing a corpus in a suitable domain for the desired task, after that, using a larger corpus yields better results. # Second, we find that faster models provide sufficient performance in most cases, and more complex models can be used if the training corpus is sufficiently large. # Third, the early stopping metric for iterating should rely on the development set of the desired task rather than the validation loss of training embedding" Characteristics Proximity of similar words Words in high-dimensional space tend to form clusters of related meaning and synonymous words are closest to each other. Algebraic relation Some simple relations are found to be represented by a constant different vectors across pairs of words. For example: W(\textrm{woman}) - W(\textrm{man}) \approx W(\textrm{queen}) - W(\textrm{king}) Similar observations were made for capital-country, celebrity-job, president-country, chairman-company,...Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781. Basis or the usage of sub-word features TODO: Bian (2014)Bian, J., Gao, B., & Liu, T. Y. (2014). Knowledge-powered deep learning for word embedding. In Machine Learning and Knowledge Discovery in Databases (pp. 132-148). Springer Berlin Heidelberg., Qing Cui et al. (2014)http://arxiv.org/pdf/1407.1687.pdf. Sources of information Text Word2vec Knowledge graph Many methods combine text and knowledge graph to get better word embeddings: retrofitting (Faruqui et al. 2015)Faruqui, M., Dodge, J., Jauhar, S. K., Dyer, C., Hovy, E., & Smith, N. A. (2015). Retrofitting Word Vectors to Semantic Lexicons. In NAACL 2015 (pp. 1606–1615). Denver, Colorado: ACL. http://doi.org/10.3115/v1/N15-1184, Liu et al. (2016)Liu, Q., Jiang, H., Ling, Z.-H., Zhu, X., Wei, S., & Hu, Y. (2016). Commonsense Knowledge Enhanced Embeddings for Solving Pronoun Disambiguation Problems in Winograd Schema Challenge., Xu et al. (2014)Xu, C., Bai, Y., Bian, J., Gao, B., Wang, G., Liu, X., & Liu, T. Y. (2014). RC-NET: A General Framework for Incorporating Knowledge into Word Representations. TODO: * M. Yu, M. Dredze, Improving lexical embeddings with semantic knowledge., in: ACL (2), 2014, pp. 545–550. * J. Bian, B. Gao, T.-Y. Liu, Knowledge-powered deep learning for word embedding, in: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Springer, 2014, pp. 132– 148. * C. Xu,Y. Bai, J. Bian, B. Gao, G.Wang, X. Liu, T.-Y. Liu, Rc-net:Ageneral framework for incorporat- ing knowledge into word representations, in: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, ACM, 2014, pp. 1219–1228. * Q. Liu, H. Jiang, S. Wei, Z.-H. Ling, Y. Hu, Learning semantic word embeddings based on ordinal knowledge constraints, in: Proceedings of ACL, 2015, pp. 1501–1511. * Weston, Jason, et al. "Connecting language and knowledge bases with embedding models for relation extraction." arXiv preprint arXiv:1307.7973 (2013).Yu, Mo, and Mark Dredze. "Improving lexical embeddings with semantic knowledge." Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. Vol. 2. 2014. TODO: comparisons between methods??? Retrofitting Faruqui et al. (2015)Faruqui, M., Dodge, J., Jauhar, S. K., Dyer, C., Hovy, E., & Smith, N. A. (2015). Retrofitting Word Vectors to Semantic Lexicons. In NAACL 2015 (pp. 1606–1615). Denver, Colorado: ACL. http://doi.org/10.3115/v1/N15-1184: "we first train the word vectors independent of the information in the semantic lexicons and then retrofit them". Inequality From Liu et al. (2016): "the knowledge constraints are formulized as semantic similarity inequalities between two word pairs... semantic inequalities from WordNet: 1) Similarities between a word and its synonymous words are larger than similarities between the word and its antonymous words. A typical example is similarity(happy, glad) > similarity(happy, sad). 2) Similarities of words that belong to the same semantic category would be larger than similarities of words that belong to different categories. 3) Similarities between words that have shorter distances in a semantic hierarchy should be larger than similarities of words that have longer distances." Models * CBOW * Skip-gram * CLOW (continuous list of words): Trask et al. 2015Trask, A., Gilmore, D., & Russell, M. (2015). Modeling Order in Neural Word Embeddings at Scale. arXiv preprint arXiv:1506.02338. PDF * PENN (partitioned embedding neural network): Trask et al. 2015 Applications Adding word embeddings to a system * Dependency parsing: Bansal et al. (2014)Bansal, M., Gimpel, K., & Livescu, K. (2014). Tailoring continuous word representations for dependency parsing. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) (Vol. 2, pp. 809-815).: "We compare several popular embeddings to Brown clusters, via multiple types of features, in both news and web domains. We find that all embeddings yield significant parsing gains, including some recent ones that can be trained in a fraction of the time of others." * Named-entity recognition: ** Tweets: Cherry and Guo (2015)Cherry, C., Guo, H., & Canada, C. (2015). The Unreasonable Effectiveness of Word Representations for Twitter Named Entity Recognition. Human Language Technologies: The 2015 Annual Conference of the North American Chapter of the ACL, (2004), 735–745. http://doi.org/10.3115/v1/N15-1075: "we build Brown clusters and word vectors, enabling generalizations across distributionally similar words ... Taken all together, we establish a new state-of-the-art on two common test sets" ** CVs: Tosik et al. (2015)Tosik, M., Rotaru, M., Goossen, G., & Hansen, C. L. (2015). Word Embeddings vs Word Types for Sequence Labeling : the Curious Case of CV Parsing. Proceedings of the 1st Workshop on Vector Space Modeling for Natural Language Processing, 123–128. http://doi.org/10.3115/v1/W15-1517: "The best results on the ex- traction task are obtained by the model which integrates the word embeddings together with a number of hand-crafted features." Word embeddings with a neural network Evaluation Most intrinsic evaluation datasets fail to predict extrinsic performance, except SimLex-999 (Chiu et al. 2016)Chiu, B., Korhonen, A., & Pyysalo, S. (2016). Intrinsic Evaluation of Word Vectors Fails to Predict Extrinsic Performance. Proceedings of the 1st Workshop on Evaluating Vector-Space Representations for NLP, 1–6. http://doi.org/10.18653/v1/W16-2501. Rogers et al (2018)Rogers, A., Ananthakrishna, S. H., & Rumshisky, A. (2018). What’s in Your Embedding, And How It Predicts Task Performance. Proceedings of the 27th International Conference on Computational Linguistics, 2690–2703. study the relationship of intrinsic factors with performance on many different tasks. Intrinsic evaluation Datasets: * Wordsim-353 (Finkelstein et al. 2001), MC-30 (Miller and Charles 1991), RG-65: small, old datasets that shouldn't be used any more. They also mix up similarity and relatedness. * WS-Rel and WS-Sim (Agirre et al. 2009) * MEN (Bruni et al. 2012) * SimLex-999 (Hill et al. 2015) Extrinsic evaluation Nayak et al. (2016)Nayak, N., Angeli, G., & Manning, C. D. (2016). Evaluating Word Embeddings Using a Representative Suite of Practical Tasks. Proceedings of the 1st Workshop on Evaluating Vector-Space Representations for NLP, (2014), 19–23. http://doi.org/10.18653/v1/W16-2504 proposed a suit of tasks to evaluate word embeddings. External links * Christopher Olah. Deep Learning, NLP, and Representations. 2014 Source code * Retrofitting: github * gensim implementation of Word2vec References Category:Distributed representation